DATA:
The authors make their best attempt to anonymize the following artifacts. Note that it may still be possible to reveal the identities of the authors by deeply studying the artifacts.
The zip containing all the data mentioned in this paper is available here, including 16 projects:
https://drive.google.com/file/d/1I9-h-CewYH6z05AKjprMNWO25iqG7Hro/view?usp=sharing (113MB zipped, 1.06GB unzipped)
RAW_DATA:
It contains many csv files which are the results of running GAsearch and brute-force.
BASELINE_DATA
It contains the results of homogeneous allocation for all projects. The data for GitHub baseline and smart baseline is sourced from here.
TEST_ALLOCATION
It contains the test allocation details for each result's corresponding project in RAW_DATA, that is, which test is allocated to which machine to run.
INTEGRATION_DATA
It contains the extracted data sourced from RAW_DATA and BASELINE_DATA.
The "category" column consists of two parts "[machine number]-[failure rate limitation]", where the range of machine number is [1, 2, 4, 6, 8, 10, 12] and the range of failure rate limitation is [0, 0.2, 0.4, 0.6, 0.8, 1].
The range of "a" is from 0 to 1 with a stride of 0.05. The definition of "a" is consistent with that in the paper.
SCRIPTS:
The zip below contains all the script files, including implementation details of GAsearch etc.
https://drive.google.com/file/d/1QQCbAyOMmfabPTIHtplVgFvLOiVbVVoD/view?usp=sharing (18.1KB zipped, 74KB unzipped)
preproc.py is to obtain the time, price and failure rate for running each test across different configurations.
proc_maven_log.py is to extract the set-up time per project across different configurations from the Maven logs.
analyze.py contains the specific algorithm implementations for GAsearch and brute-force.
plotter.py is the script for plotting.
generate_proj_info.py is the script for generating information for each project.
integrate_dat.py, comp.py and addi_analyze.py are scripts used for extracting the data.